Gaussian Process inference

7 March 2024 (updated 7 March 2024)

$$ \gdef\delim#1#2#3{\mathopen{}\mathclose{\left#1 #2 \right#3}} \gdef\p#1{\delim({#1})} \gdef\ps#1{\delim\{{#1}\}} \gdef\N{\mathcal N} \gdef\vec#1{\bm #1} \gdef\mat#1{\mathrm #1} \gdef\k{\mathrm k} \gdef\u{\mathrm u} \gdef\c{\mathrm c} \gdef\T{\mathrm T} $$

Given $\vec μ$ and $\mat Σ$ such that $\vec y$ is multivariate normal distributed.

$$ P\p{\vec y} ∼ \N\p{\vec μ, \mat Σ} $$

Given a draw of standard normal individually independent values $\vec n$, we can transform this to a draw from $\N\p{\vec μ, \mat Σ}$:

$$ \vec y = \vec μ + \mat L \cdot \vec n $$

Where $\mat L$ is the Cholesky factor such that $\mat L ⋅ \mat L^{\T} = \mat Σ$.

Computing the Cholesky decompostion

https://en.wikipedia.org/wiki/Cholesky_decomposition#The_Cholesky_algorithm

https://en.wikipedia.org/wiki/Cholesky_decomposition#The_Cholesky%E2%80%93Banachiewicz_and_Cholesky%E2%80%93Crout_algorithms

do i = 1, size(A,1)
    L(i,i) = sqrt(A(i,i) - dot_product(L(i,1:i-1), L(i,1:i-1)))
    L(i+1:,i) = (A(i+1:,i) - matmul(conjg(L(i,1:i-1)), L(i+1:,1:i-1))) / L(i,i)
end do

Inversion using Cholesky

Inference

Supposed we know a subset of values $\vec y_{\mathrm k}$ and we want to use this to infer about the remaining unknown values $\vec y_{\mathrm u}$.

$$ P\p{\vec y_{\u} \vert \vec y_{\k}} ∼ \N\p{\vec μ_{\c}, \mat Σ_{\c}} $$

This is again normally distributed with

$$ \begin{aligned} \vec μ_{\c} &= \vec μ_{\u} + Σ_{\k\u}⋅Σ_{\k\k}^{-1}⋅ \p{\vec y_{\k} - \vec μ_{\k}} \\ \mat Σ_{\c} &= \mat Σ_{\u\u} - Σ_{\u\k}⋅Σ_{\k\k}^{-1}⋅ \mat Σ_{\k\u} \end{aligned} $$

Where ${}_\k$ represents known indices and ${}_\u$ the unknown indices. To avoid inverting we want to re-use $\mat L$.

Note that the second matrix is also kind of an update:

$$ \mat Σ_{\c} = \mat Σ_{\u\u} - Σ_{\u\k} ⋅ \p{\mat L_{\k\k} ⋅ \mat L_{\k\k}^\T}^{-1} ⋅ \mat Σ_{\u\k}^\T $$

Efficient computation

Suppose we know $\mat L$. Can we use this to quickly compute $\mat L_{\k\k}$? This amounts to computing the Cholesky factor of a subset of the matrix. This can be done using repeated single-row deletion update:

https://en.wikipedia.org/wiki/Cholesky_decomposition#Adding_and_removing_rows_and_columns

Can we also compute $\mat L_{\c}$ efficiently?